Code
# Load in packages and .csv files
library(tidyverse)
library(broom)
child_mortality_data <- read.csv(here::here("child_mortality_0_5_year_olds_dying_per_1000_born.csv"))
co2_pcap_cons <- read.csv(here::here("co2_pcap_cons.csv"))This report investigated the relationship between child mortality and carbon dioxide emissions per capita over the years of 1800 – 2022. Two data sets were acquired from Gapminder, which is an organization that seeks to educate the world’s transmitters of information with up-to-date facts about many metrics of health, infrastructure, sustainability, and many other crucial topics.
The first data set, called child_mortality_data, lists the rates of child death under the age of five per 1,000 births from 197 countries (data begins at 1950 for 12 countries). Each observation corresponds to a single country, and each variable is one year.
The second data set, called co2_pcap_cons, tracks tonnes (metric) of carbon dioxide emissions per capita for 194 countries. There are no NA values in this data set. Each observation corresponds to a single country, and each variable is one year.
The hypothesized relationship between the two variables is that as the amount of CO2 emissions per capita increases for a country, its child mortality rate will drop. While we are not investigating causality, we anticipate that increased CO2 emissions indicate increasing industrialization and modernization within a country, which tends to predict dropping levels of child mortality (Ranganathan et al., 2015).
# Load in packages and .csv files
library(tidyverse)
library(broom)
child_mortality_data <- read.csv(here::here("child_mortality_0_5_year_olds_dying_per_1000_born.csv"))
co2_pcap_cons <- read.csv(here::here("co2_pcap_cons.csv"))After loading the necessary packages and reading in the two Gapminder files, both data sets were cleaned for ease of analysis. Each data set had its year variables re-designated as characters to standardize the variable type. The X’s in front of each year (carried over from Gapminder) were removed, and each year was then re-designated as a numeric type. Each data set was then pivoted into a long format so that each observation corresponded to a single country for a single year, along with their respective child_mortality and emissions values. These two data sets were named child_mort_long and co2_long.
Finally, these two cleaned data sets were joined into one data set called final_data, which includes both child_mortality and emissions values for 194 countries over the years of 1800 – 2022.
# Convert co2_pcap_cons into long format
co2_long <- co2_pcap_cons %>%
mutate(across(matches("^X(18|19|20)\\d{2}$"), as.character)) %>%
pivot_longer(
cols = matches("^X(18|19|20)\\d{2}$"),
names_to = "year",
values_to = "emissions"
) %>%
mutate(
year = str_remove(year, "^X"),
year = as.numeric(year)
)
# Convert child_mortality_data into long format
child_mort_long <- child_mortality_data %>%
mutate(across(matches("^X(18|19|20|21)\\d{2}$"), as.character)) %>%
pivot_longer(
cols = matches("^X(18|19|20|21)\\d{2}$"),
names_to = "year",
values_to = "child_mortality"
) %>%
mutate(
year = str_remove(year, "^X"),
year = as.numeric(year)
)
# Join the data sets
final_data <- child_mort_long %>%
inner_join(co2_long, by = c("year", "country")) %>%
mutate(
child_mortality = as.numeric(as.character(child_mortality)),
emissions = as.numeric(as.character(emissions))
)In order to get a sense of how these two variables relate over time, child_mortality and emissions (log transformed) for each country were averaged for each decade starting from the year 1953 through the year 2022. A new variable called Decade was created for this purpose. The resulting mean values are represented in the faceted plot below. Each point represents the mean values over a decade of child_mortality and emissions per capita for a single country.
final_data <- final_data %>%
mutate(Decade = cut(year, breaks = seq(1953, 2023, 10),
labels = c("1953-1962", "1963-1972",
"1973-1982", "1983-1992",
"1993-2002", "2003-2012",
"2013-2022"),
include.lowest = TRUE))
final_data_means <- final_data %>%
filter(!is.na(Decade)) %>%
group_by(country, Decade) %>%
summarize(
mean_emissions = mean(emissions, na.rm = TRUE),
mean_child_mortality = mean(child_mortality, na.rm = TRUE),
.groups = 'drop'
)
ggplot(final_data_means, aes(x = mean_emissions,
y = mean_child_mortality)) +
geom_point() +
facet_wrap(Decade ~ ., scales = "free") +
labs(title = "Mean Child Mortality vs. Mean Emissions Over Decades (1953 - 2022)",
x = "Mean Emissions Per Capita (Log Transformed)",
y = "") +
scale_x_log10() +
theme_bw()# Un-logged
ggplot(final_data_means, aes(x = mean_emissions,
y = mean_child_mortality)) +
geom_point() +
facet_wrap(Decade ~ ., scales = "free") +
labs(title = "Mean Child Mortality vs. Mean Emissions Over Decades (1953 - 2022)",
x = "Mean Emissions Per Capita",
y = "") +
theme_bw()Each facet in the above plot clearly shows that the global correlation between child_mortality and emissions is negative in every decade since 1953. This consistent pattern supports the hypothesis of an inverse relationship between these variables and provides a solid foundation for further analysis.
To evaluate the relationship between mean_child_mortality and mean_emissions, the final_data set was processed to compute yearly global averages for these variables, resulting in a dataset named yearly_means. Then, a scatterplot was generated, where each point represents the global averages for a given year. An overlaid line of best fit reveals a pronounced negative correlation, indicating that years with higher emissions are generally associated with lower child mortality rates, and vice versa.
# NEED TO DISCUSS - THIS PLOT USES THE WRONG VARIABLES
final_data %>%
group_by(country) %>%
summarise(
mean_child_mortality = mean(child_mortality, na.rm = TRUE),
mean_emissions = mean(emissions, na.rm = TRUE)
) %>%
ggplot(aes(x = mean_emissions, y = mean_child_mortality)) +
geom_point() +
theme_bw() +
labs(x = "Mean Emissions (Log Transformed)", y = "",
title = "Scatter Plot of Mean Child Mortality vs Mean Emissions by Country") +
geom_smooth(method = "lm", color = "blue") +
scale_x_log10()yearly_means <- final_data %>%
group_by(year) %>%
summarise(
mean_child_mortality = mean(child_mortality, na.rm = TRUE),
mean_emissions = mean(emissions, na.rm = TRUE)
)
yearly_means %>%
mutate(period = 25 * floor((year - 1800) / 25) + 1800) %>%
ggplot(aes(x = mean_emissions, y = mean_child_mortality, color = as.factor(period))) +
geom_point() +
theme_bw() +
labs(x = "Mean Emissions (Log Transformed)", y = "Mean Child Mortality",
title = "Scatter Plot of Mean Child Mortality vs Mean Emissions by Period",
color = "Period") +
geom_smooth(method = "lm", color = "blue") +
scale_x_log10() +
scale_color_viridis_d()# Un-logged
yearly_means %>%
mutate(period = 25 * floor((year - 1800) / 25) + 1800) %>%
ggplot(aes(x = mean_emissions, y = mean_child_mortality, color = as.factor(period))) +
geom_point() +
theme_bw() +
labs(x = "Mean Emissions", y = "Mean Child Mortality",
title = "Scatter Plot of Mean Child Mortality vs Mean Emissions by Period",
color = "Period") +
geom_smooth(method = "lm", color = "blue") +
scale_color_viridis_d()To extract the linear equation from the above plot, a linear model was created with mean_emissions as the explanatory variable, and mean_child_mortality as the response variable. The resulting table shows the linear analysis.
The regression equation for the model is \(\hat{y}=403.29298−74.17116⋅x\), where \(\hat{y}\) is equal to the estimated mean_child_mortality, and \(x\) is equal to the mean_emissions.
final_data_lm <- lm(mean_child_mortality ~ mean_emissions,
data = yearly_means)
tidy(final_data_lm)# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 403. 2.88 140. 8.87e-218
2 mean_emissions -74.2 1.17 -63.4 1.02e-143
yearly_means_log <- yearly_means %>%
mutate(log_mean_emissions = log(mean_emissions))
final_data_lm_log <- lm(mean_child_mortality ~ log_mean_emissions, data = yearly_means_log)tidy(final_data_lm_log)# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 240. 3.58 67.1 6.43e-149
2 log_mean_emissions -85.4 2.17 -39.4 5.44e-102
r_yearly_log <- yearly_means_log %>%
summarize(pearsons_r = cor(mean_child_mortality,
log_mean_emissions, method = "pearson")) %>%
pull(pearsons_r)final_data_lm_log %>%
augment() %>%
ggplot(aes(x = .resid)) +
geom_histogram(aes(y = ..density..), binwidth = 1, colour = "black", fill = "white") +
geom_density(alpha = .2, fill = "#FF6666") +
labs(x = "Residuals", y = "Density",
title = "Histogram of Residuals with Normal Density Curve") +
theme_bw()final_data_lm_log %>%
augment() %>%
summarise(
response_variance = var(mean_child_mortality, na.rm = TRUE),
fitted_variance = var(.fitted, na.rm = TRUE),
residuals_variance = var(.resid, na.rm = TRUE)
) %>%
pivot_longer(
everything(),
names_to = "Metric",
values_to = "Variance"
) %>%
mutate(
Metric = recode(Metric,
'response_variance' = "Variance of Response",
'fitted_variance' = "Variance of Fitted Values",
'residuals_variance' = "Variance of Residuals"
)
)# A tibble: 3 × 2
Metric Variance
<chr> <dbl>
1 Variance of Response 20559.
2 Variance of Fitted Values 18002.
3 Variance of Residuals 2558.
fitted_values_log <- predict(final_data_lm_log)
st_error_log <- sigma(final_data_lm_log)
simulated_data_log <- data.frame(predicted = fitted_values_log) %>%
mutate(simulated_child_mortality = exp(predicted + rnorm(n(), mean = 0, sd = st_error_log)))
ggplot(simulated_data_log, aes(x = predicted, y = simulated_child_mortality)) +
geom_point()In order to conclude with confidence that appropriateness of using a linear regression for this relationship, the four model conditions of linearity, independence, normality of distribution, and equality of variance must be assessed. Each will be discussed in turn:
r_yearly <- yearly_means %>%
summarize(pearsons_r = cor(mean_emissions,
mean_child_mortality, method = "pearson"))
glance(final_data_lm)# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.948 0.948 32.8 4016. 1.02e-143 1 -1094. 2194. 2204.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
1. Linearity:
By looking at the above scatterplot from section III.a, a clear linear pattern is evident. The Pearson’s correlation coefficient (r) is -0.9735707. Given these observations, the first criteria failed to be violated.
2. Independence:
The data were intentionally aggregated to calculate global rates of child mortality and emissions per capita, averaging the values without regard to their country of origin. However, considering that each year’s data likely has a significant causal influence on the data of the following year, this model may be at risk of violating the independence assumption. The temporal nature of the data suggests that observations across years are not entirely independent, which could impact the validity of the independence assumption in our analysis.
3. Normality of residuals:
The histogram below displays the distribution of residuals from the model. The distribution is approximately centered around zero, suggesting that the model does not have systematic bias. The overall shape of the distribution approximates a bell curve, but with a pronounced peak (leptokurtosis) and a moderate right skew. Nonetheless, this does not necessarily violate the assumption of normality, especially considering the large sample size.
final_data_lm |>
augment() |>
ggplot(aes(x = .fitted, y = .resid)) +
geom_point()# Histogram
final_data_lm %>%
augment() %>%
ggplot(aes(x = .resid)) +
geom_histogram(aes(y = ..density..), binwidth = 1,
colour = "black", fill = "white") +
geom_density(alpha = .2, fill = "#FF6666") +
labs(x = "Residuals", y = "",
title = "Histogram of Residuals with Normal Density Curve") +
theme_bw()4. Equal variance of residuals:
The scatterplot below illustrates the distribution of residuals against predicted values, providing insight into the homoscedasticity of the model. It appears that there is an increase in the spread of residuals as the predicted values rise. This pattern suggests a potential violation of the equal variance assumption, which could compromise the reliability of the linear regression analysis.
final_data_lm %>%
augment() %>%
summarise(
response_variance = var(mean_child_mortality, na.rm = TRUE),
fitted_variance = var(.fitted, na.rm = TRUE),
residuals_variance = var(.resid, na.rm = TRUE)
) %>%
pivot_longer(
everything(),
names_to = "Metric",
values_to = "Variance"
) %>%
mutate(
Metric = case_when(
Metric == "response_variance" ~ "Variance of Response",
Metric == "fitted_variance" ~ "Variance of Fitted Values",
Metric == "residuals_variance" ~ "Variance of Residuals",
TRUE ~ Metric # Fallback, should not be needed
)
)# A tibble: 3 × 2
Metric Variance
<chr> <dbl>
1 Variance of Response 20559.
2 Variance of Fitted Values 19487.
3 Variance of Residuals 1072.
predictions <- predict(final_data_lm)
st_error <- sigma(final_data_lm)
predictions %>%
data.frame(predicted = .) %>%
mutate(residuals = rnorm(n(), mean = 0, sd = st_error)) %>%
ggplot(aes(x = predicted, y = residuals)) +
geom_point() +
theme_bw() +
labs(x = "Predicted values", y = "",
title = "Residuals vs. Predicted Values")| To be incorporated: |
| The Adjusted R-squared was 0.9476 for the data indicating that close to 95% of the variability in mean child mortality can be explained by mean CO2 emissions. This indicates that the model is a very good fit for the data. The only issue is the plot of the residuals, since the residuals do not have a random distribution and seem to follow a pattern. This could indicate that the relationship between CO2 emissions and child mortality is non-linear and could lead to issues with other tests on the data. |
noise <- function(x, mean = 0, sd) {
return(x + rnorm(length(x),
mean,
sd)
)
}
simulations <- map_dfc(.x = 1:1000,
.f = ~tibble(simulation = noise(predictions,
sd = st_error)
)
)colnames(simulations) <- colnames(simulations) |>
str_replace(pattern = "\\.\\.\\.",
replace = "_")
binded_simulations <- yearly_means |>
filter(!is.na(mean_child_mortality),
!is.na(mean_emissions)
) |>
select(mean_child_mortality) |>
bind_cols(simulations)
head(binded_simulations)simulated_r_sq <- binded_simulations |>
map(~ lm(mean_child_mortality ~ .x, data = binded_simulations)
) |>
map(glance) |>
map_dbl(~ .x$r.squared)
head(simulated_r_sq)mean_child_mortality simulation_1 simulation_2
1.0000000 0.9030751 0.9210450
simulation_3 simulation_4 simulation_5
0.9050067 0.8952918 0.8989057
tibble(simulations = simulated_r_sq) |>
ggplot(aes(x = simulations)) +
geom_histogram(binwidth = 0.001, color = "black", fill = "skyblue") +
labs(x = expression("Simulated"~ R^2),
y = "",
subtitle = "Number of Simulated Models") +
theme_bw() +
scale_y_continuous(breaks = seq(0,
60,
by = 2)
)[text here]
Ranganathan et al., 2015 https://www.nature.com/articles/palcomms201533
library(knitr)
library(kableExtra)
# Averaging across all years for each country
means_data <- final_data %>%
group_by(country) %>%
summarise(
mean_child_mortality = mean(child_mortality, na.rm = TRUE),
mean_emissions = mean(emissions, na.rm = TRUE)
) %>%
mutate(
log_mean_emissions = log(mean_emissions)
) %>%
ungroup()
means_data_kable <- kable(means_data, "html", escape = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F, position = "left") %>%
scroll_box(width = "100%", height = "500px")
means_data_kable| country | mean_child_mortality | mean_emissions | log_mean_emissions |
|---|---|---|---|
| Afghanistan | 386.25740 | 0.0466368 | -3.0653660 |
| Albania | 273.38637 | 0.4961345 | -0.7009082 |
| Algeria | 337.47623 | 0.7639058 | -0.2693108 |
| Andorra | 19.21055 | 2.7854933 | 1.0244250 |
| Angola | 389.17399 | 0.2111839 | -1.5550262 |
| Antigua and Barbuda | 198.28354 | 1.5734978 | 0.4533010 |
| Argentina | 236.32422 | 1.4339731 | 0.3604490 |
| Armenia | 229.97534 | 0.7998879 | -0.2232837 |
| Australia | 164.54354 | 5.2926009 | 1.6663098 |
| Austria | 225.58197 | 4.2965874 | 1.4578211 |
| Azerbaijan | 240.90448 | 1.7380404 | 0.5527582 |
| Bahamas | 182.38238 | 3.5361121 | 1.2630278 |
| Bahrain | 311.77955 | 4.4071166 | 1.4832206 |
| Bangladesh | 387.63767 | 0.0727758 | -2.6203720 |
| Barbados | 365.35740 | 0.9733587 | -0.0270026 |
| Belarus | 223.34018 | 2.0042735 | 0.6952817 |
| Belgium | 166.45704 | 8.4054709 | 2.1288828 |
| Belize | 215.76906 | 0.4507937 | -0.7967454 |
| Benin | 349.71256 | 0.1069013 | -2.2358489 |
| Bhutan | 349.52960 | 0.1370269 | -1.9875780 |
| Bolivia | 316.47354 | 0.3524843 | -1.0427492 |
| Bosnia and Herzegovina | 259.09130 | 1.2985919 | 0.2612805 |
| Botswana | 291.26413 | 0.5983901 | -0.5135123 |
| Brazil | 297.82691 | 0.5622511 | -0.5758067 |
| Brunei | 289.50368 | 6.3411614 | 1.8470619 |
| Bulgaria | 221.15511 | 1.8445516 | 0.6122362 |
| Burkina Faso | 374.30762 | 0.0302556 | -3.4980738 |
| Burundi | 338.23632 | 0.0125561 | -4.3775524 |
| Cambodia | 301.53453 | 0.1707220 | -1.7677189 |
| Cameroon | 361.40762 | 0.0793677 | -2.5336636 |
| Canada | 180.01933 | 7.2390493 | 1.9794899 |
| Cape Verde | 292.17982 | 0.3029641 | -1.1941409 |
| Central African Republic | 356.56054 | 0.0342915 | -3.3728584 |
| Chad | 356.25112 | 0.0196637 | -3.9289821 |
| Chile | 299.68090 | 1.1648655 | 0.1526056 |
| China | 305.78283 | 0.8371480 | -0.1777544 |
| Colombia | 274.75874 | 0.5669058 | -0.5675621 |
| Comoros | 329.61973 | 0.0583049 | -2.8420686 |
| Congo, Dem. Rep. | 341.65291 | 0.0352735 | -3.3446221 |
| Congo, Rep. | 306.12960 | 0.1779821 | -1.7260725 |
| Costa Rica | 278.65794 | 0.5615336 | -0.5770836 |
| Cote d'Ivoire | 346.98834 | 0.1607130 | -1.8281351 |
| Croatia | 238.03789 | 1.3277578 | 0.2834917 |
| Cuba | 239.56677 | 0.7723139 | -0.2583642 |
| Cyprus | 209.90538 | 1.9542960 | 0.6700300 |
| Czech Republic | 220.15224 | 5.2377982 | 1.6559012 |
| Denmark | 147.32538 | 4.5645919 | 1.5183291 |
| Djibouti | 342.72825 | 0.2611031 | -1.3428398 |
| Dominica | 53.66575 | 0.3456143 | -1.0624317 |
| Dominican Republic | 312.21973 | 0.5868655 | -0.5329597 |
| Ecuador | 292.73498 | 0.5138206 | -0.6658810 |
| Egypt | 317.43543 | 0.4625695 | -0.7709584 |
| El Salvador | 326.72377 | 0.2956413 | -1.2186085 |
| Equatorial Guinea | 359.86816 | 0.7619910 | -0.2718205 |
| Eritrea | 345.57489 | 0.0729238 | -2.6183407 |
| Estonia | 205.93753 | 4.4824439 | 1.5001684 |
| Eswatini | 313.37937 | 0.2373857 | -1.4380692 |
| Ethiopia | 357.23049 | 0.0209955 | -3.8634464 |
| Fiji | 329.91076 | 0.4592780 | -0.7780995 |
| Finland | 186.78229 | 3.8995740 | 1.3608673 |
| France | 162.37700 | 4.0272780 | 1.3930907 |
| Gabon | 325.99283 | 1.2170448 | 0.1964257 |
| Gambia | 372.29821 | 0.0687848 | -2.6767732 |
| Georgia | 233.11040 | 0.8167220 | -0.2024565 |
| Germany | 229.78448 | 6.3502646 | 1.8484965 |
| Ghana | 349.55471 | 0.1608565 | -1.8272426 |
| Greece | 208.14655 | 1.8874978 | 0.6352520 |
| Grenada | 206.47848 | 0.3923049 | -0.9357159 |
| Guatemala | 363.13229 | 0.2333453 | -1.4552360 |
| Guinea | 369.68475 | 0.0740448 | -2.6030844 |
| Guinea-Bissau | 348.37444 | 0.0464484 | -3.0694126 |
| Guyana | 271.49417 | 0.7439776 | -0.2957444 |
| Haiti | 375.11749 | 0.0527175 | -2.9428080 |
| Honduras | 288.90807 | 0.2420179 | -1.4187434 |
| Hong Kong, China | 21.71986 | 2.9010045 | 1.0650571 |
| Hungary | 240.81704 | 2.6974664 | 0.9923130 |
| Iceland | 180.74794 | 2.8275247 | 1.0394017 |
| India | 351.83229 | 0.2609552 | -1.3434067 |
| Indonesia | 327.94081 | 0.3532466 | -1.0405888 |
| Iran | 377.38341 | 1.1978430 | 0.1805225 |
| Iraq | 313.98161 | 1.1182063 | 0.1117259 |
| Ireland | 152.76081 | 3.4560807 | 1.2401352 |
| Israel | 253.29700 | 2.6546502 | 0.9763129 |
| Italy | 243.21148 | 2.5398744 | 0.9321146 |
| Jamaica | 230.29552 | 0.8638565 | -0.1463486 |
| Japan | 219.69170 | 3.0719865 | 1.1223244 |
| Jordan | 298.61031 | 0.9714664 | -0.0289486 |
| Kazakhstan | 317.96915 | 4.3394350 | 1.4677442 |
| Kenya | 373.66502 | 0.1628700 | -1.8148032 |
| Kiribati | 272.47489 | 0.1575874 | -1.8477748 |
| Kuwait | 342.07493 | 9.7526906 | 2.2775432 |
| Kyrgyz Republic | 263.57175 | 1.1759193 | 0.1620502 |
| Lao | 329.20762 | 0.1677175 | -1.7854743 |
| Latvia | 217.02982 | 1.9157354 | 0.6501016 |
| Lebanon | 299.93682 | 0.8933408 | -0.1127871 |
| Lesotho | 309.94036 | 0.2969283 | -1.2142647 |
| Liberia | 353.10852 | 0.1338700 | -2.0108864 |
| Libya | 302.45426 | 2.2372018 | 0.8052259 |
| Liechtenstein | 21.24575 | 2.5136143 | 0.9217217 |
| Lithuania | 251.93484 | 2.4752242 | 0.9063310 |
| Luxembourg | 202.32336 | 9.7173722 | 2.2739152 |
| Madagascar | 338.22915 | 0.0517175 | -2.9619593 |
| Malawi | 366.96682 | 0.0703812 | -2.6538296 |
| Malaysia | 291.41170 | 1.2340269 | 0.2102827 |
| Maldives | 324.39493 | 0.3335291 | -1.0980250 |
| Mali | 412.50090 | 0.0243901 | -3.7135766 |
| Malta | 224.87215 | 2.7239507 | 1.0020833 |
| Marshall Islands | 67.14795 | 0.5554978 | -0.5878907 |
| Mauritania | 326.60269 | 0.1430628 | -1.9444717 |
| Mauritius | 242.85650 | 0.8357713 | -0.1794003 |
| Mexico | 326.68655 | 1.3734305 | 0.3173116 |
| Micronesia, Fed. Sts. | 215.15202 | 0.3242287 | -1.1263061 |
| Moldova | 231.82287 | 1.2841704 | 0.2501129 |
| Mongolia | 318.70000 | 1.4798117 | 0.3919148 |
| Montenegro | 246.16987 | 0.8760897 | -0.1322868 |
| Morocco | 303.78565 | 0.3517982 | -1.0446975 |
| Mozambique | 357.83274 | 0.1464305 | -1.9212044 |
| Myanmar | 333.33946 | 0.1638206 | -1.8089832 |
| Namibia | 293.19731 | 0.4525381 | -0.7928833 |
| Nauru | 69.70000 | 3.4392242 | 1.2352459 |
| Nepal | 325.18072 | 0.0465112 | -3.0680619 |
| Netherlands | 164.92390 | 4.9606682 | 1.6015404 |
| New Zealand | 144.92924 | 3.2775067 | 1.1870830 |
| Nicaragua | 357.10045 | 0.2547892 | -1.3673186 |
| Niger | 370.58296 | 0.0228610 | -3.7783235 |
| Nigeria | 363.45291 | 0.1572511 | -1.8499113 |
| North Korea | 332.15157 | 1.2094664 | 0.1901792 |
| North Macedonia | 258.76874 | 1.7188565 | 0.5416592 |
| Norway | 122.41036 | 3.1672063 | 1.1528499 |
| Oman | 322.01018 | 2.2998296 | 0.8328350 |
| Pakistan | 382.93812 | 0.1901839 | -1.6597640 |
| Palau | 37.85342 | 3.6630762 | 1.2983033 |
| Palestine | 307.80807 | 0.1181749 | -2.1355897 |
| Panama | 271.92108 | 0.4641250 | -0.7676014 |
| Papua New Guinea | 315.19013 | 0.1545830 | -1.8670244 |
| Paraguay | 254.11031 | 0.2817265 | -1.2668187 |
| Peru | 282.35157 | 0.5690000 | -0.5638748 |
| Philippines | 298.00135 | 0.3092466 | -1.1736161 |
| Poland | 231.20291 | 3.6753722 | 1.3016544 |
| Portugal | 235.22480 | 1.6005650 | 0.4703567 |
| Qatar | 297.75655 | 7.3184753 | 1.9904020 |
| Romania | 249.18300 | 1.7500807 | 0.5596619 |
| Russia | 269.54960 | 3.4704484 | 1.2442838 |
| Rwanda | 329.05605 | 0.0239462 | -3.7319461 |
| Samoa | 218.51166 | 0.2134529 | -1.5443390 |
| Sao Tome and Principe | 254.96816 | 0.1302287 | -2.0384631 |
| Saudi Arabia | 307.46054 | 3.9673274 | 1.3780927 |
| Senegal | 395.24439 | 0.1673049 | -1.7879372 |
| Serbia | 217.43511 | 1.5843004 | 0.4601430 |
| Seychelles | 181.79686 | 0.9933677 | -0.0066544 |
| Sierra Leone | 429.65291 | 0.0647040 | -2.7379317 |
| Singapore | 263.00704 | 6.4327489 | 1.8614020 |
| Slovak Republic | 225.74915 | 4.5391211 | 1.5127334 |
| Slovenia | 222.98291 | 2.5129238 | 0.9214469 |
| Solomon Islands | 357.53587 | 0.1097085 | -2.2099282 |
| Somalia | 362.48879 | 0.0270852 | -3.6087678 |
| South Africa | 293.02466 | 2.5586906 | 0.9394956 |
| South Korea | 334.21139 | 2.1194753 | 0.7511686 |
| South Sudan | 398.13184 | 0.0370448 | -3.2956261 |
| Spain | 231.76623 | 1.9819283 | 0.6840702 |
| Sri Lanka | 272.26969 | 0.2886413 | -1.2425707 |
| St. Kitts and Nevis | 59.94247 | 0.8058386 | -0.2158718 |
| St. Lucia | 220.13901 | 0.4149238 | -0.8796605 |
| St. Vincent and the Grenadines | 230.02691 | 0.3193408 | -1.1414964 |
| Sudan | 319.03094 | 0.1166996 | -2.1481526 |
| Suriname | 273.07309 | 1.4609686 | 0.3790996 |
| Sweden | 128.63444 | 3.7264305 | 1.3154508 |
| Switzerland | 166.26103 | 4.1204439 | 1.4159609 |
| Syria | 303.85067 | 0.6301390 | -0.4618148 |
| Taiwan | 296.09933 | 2.4094574 | 0.8794016 |
| Tajikistan | 255.55426 | 0.5052063 | -0.6827885 |
| Tanzania | 325.12556 | 0.0730628 | -2.6164362 |
| Thailand | 302.30291 | 0.5908296 | -0.5262276 |
| Timor-Leste | 363.87668 | 0.0569910 | -2.8648614 |
| Togo | 336.44709 | 0.0909552 | -2.3973887 |
| Tonga | 190.89417 | 0.2493049 | -1.3890785 |
| Trinidad and Tobago | 223.89283 | 4.8734843 | 1.5838091 |
| Tunisia | 338.26951 | 0.4442825 | -0.8112946 |
| Turkey | 290.41529 | 0.9963004 | -0.0037064 |
| Turkmenistan | 330.32287 | 4.0312960 | 1.3940879 |
| Tuvalu | 68.19726 | 0.2638565 | -1.3323499 |
| UAE | 309.10951 | 5.5005561 | 1.7048492 |
| UK | 153.93148 | 8.0481614 | 2.0854437 |
| USA | 177.84022 | 10.0326951 | 2.3058493 |
| Uganda | 386.88161 | 0.0422108 | -3.1650801 |
| Ukraine | 239.93363 | 2.4806592 | 0.9085243 |
| Uruguay | 242.11081 | 0.7128924 | -0.3384248 |
| Uzbekistan | 269.65426 | 1.9439327 | 0.6647131 |
| Vanuatu | 218.91480 | 0.2458475 | -1.4030437 |
| Venezuela | 269.91480 | 1.9177803 | 0.6511684 |
| Vietnam | 287.26144 | 0.2582825 | -1.3537013 |
| Yemen | 422.88161 | 0.2103677 | -1.5588983 |
| Zambia | 319.54350 | 0.5254260 | -0.6435459 |
| Zimbabwe | 292.04709 | 0.6607399 | -0.4143950 |
# Over time scatterplot
final_data <- final_data %>%
mutate(Decade = cut(year, breaks = seq(1953, 2023, 10),
labels = c("1953-1962", "1963-1972",
"1973-1982", "1983-1992",
"1993-2002", "2003-2012",
"2013-2022"),
include.lowest = TRUE))
final_data_means <- final_data %>%
filter(!is.na(Decade)) %>%
group_by(country, Decade) %>%
summarize(
mean_emissions = mean(emissions, na.rm = TRUE),
mean_child_mortality = mean(child_mortality, na.rm = TRUE),
.groups = 'drop'
)
ggplot(final_data_means, aes(x = mean_emissions,
y = mean_child_mortality)) +
geom_point() +
facet_wrap(Decade ~ ., scales = "free") +
labs(title = "Mean Child Mortality vs. Mean Emissions Over Decades (1953 - 2022)",
x = "Mean Emissions Per Capita (Log Transformed)",
y = "") +
scale_x_log10() +
theme_bw()# Snapshot Scatterplot
means_data %>%
ggplot(aes(x = log_mean_emissions, y = mean_child_mortality)) +
geom_point() +
theme_bw() +
labs(x = "Log of Mean Emissions", y = "",
title = "Scatter Plot of Mean Child Mortality vs Log of Mean Emissions by Country") +
geom_smooth(method = "lm", color = "blue")# New linear model
means_data_lm <- lm(mean_child_mortality ~ log_mean_emissions,
data = means_data)
tidy(means_data_lm)# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 260. 4.93 52.7 4.32e-116
2 log_mean_emissions -31.3 2.98 -10.5 1.07e- 20
# New scatterplot and histogram for residuals
means_data_lm %>%
augment() %>%
ggplot(aes(x = .fitted, y = .resid)) +
geom_point()means_data_lm %>%
augment() %>%
ggplot(aes(x = .resid)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "white") + # Removed binwidth
geom_density(alpha = .2, fill = "#FF6666") +
labs(x = "Residuals", y = "Density",
title = "Histogram of Residuals with Normal Density Curve") +
theme_bw()# New variances table
means_data_lm %>%
augment() %>%
summarise(
response_variance = var(mean_child_mortality, na.rm = TRUE),
fitted_variance = var(.fitted, na.rm = TRUE),
residuals_variance = var(.resid, na.rm = TRUE)
) %>%
pivot_longer(
everything(),
names_to = "Metric",
values_to = "Variance"
) %>%
mutate(
Metric = case_when(
Metric == "response_variance" ~ "Variance of Response",
Metric == "fitted_variance" ~ "Variance of Fitted Values",
Metric == "residuals_variance" ~ "Variance of Residuals",
TRUE ~ Metric # Fallback, should not be needed
)
)# A tibble: 3 × 2
Metric Variance
<chr> <dbl>
1 Variance of Response 6730.
2 Variance of Fitted Values 2458.
3 Variance of Residuals 4272.
predictions_means <- predict(means_data_lm)
st_error_means <- sigma(means_data_lm)
predictions_means %>%
data.frame(predicted = .) %>%
mutate(residuals = rnorm(n(), mean = 0, sd = st_error_means)) %>%
ggplot(aes(x = predicted, y = residuals)) +
geom_point() +
theme_bw() +
labs(x = "Predicted values", y = "",
title = "Residuals vs. Predicted Values")predictions_means <- predict(means_data_lm)
# Generate simulated residuals
simulated_residuals_means <- rnorm(length(predictions_means), mean = 0, sd = sigma(means_data_lm))
# Create a dataframe with predicted values and simulated residuals
simulated_data_means <- data.frame(predicted = predictions_means, residuals = simulated_residuals_means)
# Plot simulated residuals vs predicted values to assess variance
ggplot(simulated_data_means, aes(x = predicted, y = residuals)) +
geom_point() +
theme_bw() +
labs(x = "Predicted Values", y = "Simulated Response Values",
title = "Simulated Response Values vs. Predicted Values")noise <- function(x, mean = 0, sd) {
return(x + rnorm(length(x), mean, sd))
}
# Generate 1000 simulations
simulations_means <- map_dfc(.x = 1:1000,
.f = ~tibble(simulation = noise(predictions_means,
sd = st_error_means)
)
)colnames(simulations_means) <- str_replace(colnames(simulations_means), pattern = "\\.{3}", replace = "_")
binded_simulations_means <- means_data %>%
filter(!is.na(mean_child_mortality), !is.na(log_mean_emissions)) %>%
select(country, mean_child_mortality, log_mean_emissions) %>%
bind_cols(simulations_means)
head(binded_simulations_means)# A tibble: 6 × 1,003
country mean_child_mortality log_mean_emissions simulation_1 simulation_2
<chr> <dbl> <dbl> <dbl> <dbl>
1 Afghanistan 386. -3.07 351. 333.
2 Albania 273. -0.701 375. 305.
3 Algeria 337. -0.269 239. 330.
4 Andorra 19.2 1.02 226. 181.
5 Angola 389. -1.56 327. 247.
6 Antigua and… 198. 0.453 258. 275.
# ℹ 998 more variables: simulation_3 <dbl>, simulation_4 <dbl>,
# simulation_5 <dbl>, simulation_6 <dbl>, simulation_7 <dbl>,
# simulation_8 <dbl>, simulation_9 <dbl>, simulation_10 <dbl>,
# simulation_11 <dbl>, simulation_12 <dbl>, simulation_13 <dbl>,
# simulation_14 <dbl>, simulation_15 <dbl>, simulation_16 <dbl>,
# simulation_17 <dbl>, simulation_18 <dbl>, simulation_19 <dbl>,
# simulation_20 <dbl>, simulation_21 <dbl>, simulation_22 <dbl>, …
simulation_cols_means <- names(binded_simulations_means)[-(1:3)]
simulated_r_sq_means <- map_dbl(simulation_cols_means, ~ {
formula <- as.formula(paste("mean_child_mortality ~", .x))
model <- lm(formula, data = binded_simulations_means)
summary(model)$r.squared
})
tibble(simulations = simulated_r_sq_means) %>%
ggplot(aes(x = simulations)) +
geom_histogram(binwidth = 0.001, color = "black", fill = "skyblue") +
labs(x = expression("Simulated"~ R^2),
y = "Frequency",
subtitle = "Histogram of Simulated R-squared Values") +
theme_bw() +
scale_y_continuous()library(knitr)
library(kableExtra)
# Averaging across all years for each country
means_data <- final_data %>%
group_by(country) %>%
summarise(
mean_child_mortality = mean(child_mortality, na.rm = TRUE),
mean_emissions = mean(emissions, na.rm = TRUE)
) %>%
mutate(
log_mean_emissions = log(mean_emissions)
) %>%
ungroup()
means_data_kable <- kable(means_data, "html", escape = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F, position = "left") %>%
scroll_box(width = "100%", height = "500px")
means_data_kable| country | mean_child_mortality | mean_emissions | log_mean_emissions |
|---|---|---|---|
| Afghanistan | 386.25740 | 0.0466368 | -3.0653660 |
| Albania | 273.38637 | 0.4961345 | -0.7009082 |
| Algeria | 337.47623 | 0.7639058 | -0.2693108 |
| Andorra | 19.21055 | 2.7854933 | 1.0244250 |
| Angola | 389.17399 | 0.2111839 | -1.5550262 |
| Antigua and Barbuda | 198.28354 | 1.5734978 | 0.4533010 |
| Argentina | 236.32422 | 1.4339731 | 0.3604490 |
| Armenia | 229.97534 | 0.7998879 | -0.2232837 |
| Australia | 164.54354 | 5.2926009 | 1.6663098 |
| Austria | 225.58197 | 4.2965874 | 1.4578211 |
| Azerbaijan | 240.90448 | 1.7380404 | 0.5527582 |
| Bahamas | 182.38238 | 3.5361121 | 1.2630278 |
| Bahrain | 311.77955 | 4.4071166 | 1.4832206 |
| Bangladesh | 387.63767 | 0.0727758 | -2.6203720 |
| Barbados | 365.35740 | 0.9733587 | -0.0270026 |
| Belarus | 223.34018 | 2.0042735 | 0.6952817 |
| Belgium | 166.45704 | 8.4054709 | 2.1288828 |
| Belize | 215.76906 | 0.4507937 | -0.7967454 |
| Benin | 349.71256 | 0.1069013 | -2.2358489 |
| Bhutan | 349.52960 | 0.1370269 | -1.9875780 |
| Bolivia | 316.47354 | 0.3524843 | -1.0427492 |
| Bosnia and Herzegovina | 259.09130 | 1.2985919 | 0.2612805 |
| Botswana | 291.26413 | 0.5983901 | -0.5135123 |
| Brazil | 297.82691 | 0.5622511 | -0.5758067 |
| Brunei | 289.50368 | 6.3411614 | 1.8470619 |
| Bulgaria | 221.15511 | 1.8445516 | 0.6122362 |
| Burkina Faso | 374.30762 | 0.0302556 | -3.4980738 |
| Burundi | 338.23632 | 0.0125561 | -4.3775524 |
| Cambodia | 301.53453 | 0.1707220 | -1.7677189 |
| Cameroon | 361.40762 | 0.0793677 | -2.5336636 |
| Canada | 180.01933 | 7.2390493 | 1.9794899 |
| Cape Verde | 292.17982 | 0.3029641 | -1.1941409 |
| Central African Republic | 356.56054 | 0.0342915 | -3.3728584 |
| Chad | 356.25112 | 0.0196637 | -3.9289821 |
| Chile | 299.68090 | 1.1648655 | 0.1526056 |
| China | 305.78283 | 0.8371480 | -0.1777544 |
| Colombia | 274.75874 | 0.5669058 | -0.5675621 |
| Comoros | 329.61973 | 0.0583049 | -2.8420686 |
| Congo, Dem. Rep. | 341.65291 | 0.0352735 | -3.3446221 |
| Congo, Rep. | 306.12960 | 0.1779821 | -1.7260725 |
| Costa Rica | 278.65794 | 0.5615336 | -0.5770836 |
| Cote d'Ivoire | 346.98834 | 0.1607130 | -1.8281351 |
| Croatia | 238.03789 | 1.3277578 | 0.2834917 |
| Cuba | 239.56677 | 0.7723139 | -0.2583642 |
| Cyprus | 209.90538 | 1.9542960 | 0.6700300 |
| Czech Republic | 220.15224 | 5.2377982 | 1.6559012 |
| Denmark | 147.32538 | 4.5645919 | 1.5183291 |
| Djibouti | 342.72825 | 0.2611031 | -1.3428398 |
| Dominica | 53.66575 | 0.3456143 | -1.0624317 |
| Dominican Republic | 312.21973 | 0.5868655 | -0.5329597 |
| Ecuador | 292.73498 | 0.5138206 | -0.6658810 |
| Egypt | 317.43543 | 0.4625695 | -0.7709584 |
| El Salvador | 326.72377 | 0.2956413 | -1.2186085 |
| Equatorial Guinea | 359.86816 | 0.7619910 | -0.2718205 |
| Eritrea | 345.57489 | 0.0729238 | -2.6183407 |
| Estonia | 205.93753 | 4.4824439 | 1.5001684 |
| Eswatini | 313.37937 | 0.2373857 | -1.4380692 |
| Ethiopia | 357.23049 | 0.0209955 | -3.8634464 |
| Fiji | 329.91076 | 0.4592780 | -0.7780995 |
| Finland | 186.78229 | 3.8995740 | 1.3608673 |
| France | 162.37700 | 4.0272780 | 1.3930907 |
| Gabon | 325.99283 | 1.2170448 | 0.1964257 |
| Gambia | 372.29821 | 0.0687848 | -2.6767732 |
| Georgia | 233.11040 | 0.8167220 | -0.2024565 |
| Germany | 229.78448 | 6.3502646 | 1.8484965 |
| Ghana | 349.55471 | 0.1608565 | -1.8272426 |
| Greece | 208.14655 | 1.8874978 | 0.6352520 |
| Grenada | 206.47848 | 0.3923049 | -0.9357159 |
| Guatemala | 363.13229 | 0.2333453 | -1.4552360 |
| Guinea | 369.68475 | 0.0740448 | -2.6030844 |
| Guinea-Bissau | 348.37444 | 0.0464484 | -3.0694126 |
| Guyana | 271.49417 | 0.7439776 | -0.2957444 |
| Haiti | 375.11749 | 0.0527175 | -2.9428080 |
| Honduras | 288.90807 | 0.2420179 | -1.4187434 |
| Hong Kong, China | 21.71986 | 2.9010045 | 1.0650571 |
| Hungary | 240.81704 | 2.6974664 | 0.9923130 |
| Iceland | 180.74794 | 2.8275247 | 1.0394017 |
| India | 351.83229 | 0.2609552 | -1.3434067 |
| Indonesia | 327.94081 | 0.3532466 | -1.0405888 |
| Iran | 377.38341 | 1.1978430 | 0.1805225 |
| Iraq | 313.98161 | 1.1182063 | 0.1117259 |
| Ireland | 152.76081 | 3.4560807 | 1.2401352 |
| Israel | 253.29700 | 2.6546502 | 0.9763129 |
| Italy | 243.21148 | 2.5398744 | 0.9321146 |
| Jamaica | 230.29552 | 0.8638565 | -0.1463486 |
| Japan | 219.69170 | 3.0719865 | 1.1223244 |
| Jordan | 298.61031 | 0.9714664 | -0.0289486 |
| Kazakhstan | 317.96915 | 4.3394350 | 1.4677442 |
| Kenya | 373.66502 | 0.1628700 | -1.8148032 |
| Kiribati | 272.47489 | 0.1575874 | -1.8477748 |
| Kuwait | 342.07493 | 9.7526906 | 2.2775432 |
| Kyrgyz Republic | 263.57175 | 1.1759193 | 0.1620502 |
| Lao | 329.20762 | 0.1677175 | -1.7854743 |
| Latvia | 217.02982 | 1.9157354 | 0.6501016 |
| Lebanon | 299.93682 | 0.8933408 | -0.1127871 |
| Lesotho | 309.94036 | 0.2969283 | -1.2142647 |
| Liberia | 353.10852 | 0.1338700 | -2.0108864 |
| Libya | 302.45426 | 2.2372018 | 0.8052259 |
| Liechtenstein | 21.24575 | 2.5136143 | 0.9217217 |
| Lithuania | 251.93484 | 2.4752242 | 0.9063310 |
| Luxembourg | 202.32336 | 9.7173722 | 2.2739152 |
| Madagascar | 338.22915 | 0.0517175 | -2.9619593 |
| Malawi | 366.96682 | 0.0703812 | -2.6538296 |
| Malaysia | 291.41170 | 1.2340269 | 0.2102827 |
| Maldives | 324.39493 | 0.3335291 | -1.0980250 |
| Mali | 412.50090 | 0.0243901 | -3.7135766 |
| Malta | 224.87215 | 2.7239507 | 1.0020833 |
| Marshall Islands | 67.14795 | 0.5554978 | -0.5878907 |
| Mauritania | 326.60269 | 0.1430628 | -1.9444717 |
| Mauritius | 242.85650 | 0.8357713 | -0.1794003 |
| Mexico | 326.68655 | 1.3734305 | 0.3173116 |
| Micronesia, Fed. Sts. | 215.15202 | 0.3242287 | -1.1263061 |
| Moldova | 231.82287 | 1.2841704 | 0.2501129 |
| Mongolia | 318.70000 | 1.4798117 | 0.3919148 |
| Montenegro | 246.16987 | 0.8760897 | -0.1322868 |
| Morocco | 303.78565 | 0.3517982 | -1.0446975 |
| Mozambique | 357.83274 | 0.1464305 | -1.9212044 |
| Myanmar | 333.33946 | 0.1638206 | -1.8089832 |
| Namibia | 293.19731 | 0.4525381 | -0.7928833 |
| Nauru | 69.70000 | 3.4392242 | 1.2352459 |
| Nepal | 325.18072 | 0.0465112 | -3.0680619 |
| Netherlands | 164.92390 | 4.9606682 | 1.6015404 |
| New Zealand | 144.92924 | 3.2775067 | 1.1870830 |
| Nicaragua | 357.10045 | 0.2547892 | -1.3673186 |
| Niger | 370.58296 | 0.0228610 | -3.7783235 |
| Nigeria | 363.45291 | 0.1572511 | -1.8499113 |
| North Korea | 332.15157 | 1.2094664 | 0.1901792 |
| North Macedonia | 258.76874 | 1.7188565 | 0.5416592 |
| Norway | 122.41036 | 3.1672063 | 1.1528499 |
| Oman | 322.01018 | 2.2998296 | 0.8328350 |
| Pakistan | 382.93812 | 0.1901839 | -1.6597640 |
| Palau | 37.85342 | 3.6630762 | 1.2983033 |
| Palestine | 307.80807 | 0.1181749 | -2.1355897 |
| Panama | 271.92108 | 0.4641250 | -0.7676014 |
| Papua New Guinea | 315.19013 | 0.1545830 | -1.8670244 |
| Paraguay | 254.11031 | 0.2817265 | -1.2668187 |
| Peru | 282.35157 | 0.5690000 | -0.5638748 |
| Philippines | 298.00135 | 0.3092466 | -1.1736161 |
| Poland | 231.20291 | 3.6753722 | 1.3016544 |
| Portugal | 235.22480 | 1.6005650 | 0.4703567 |
| Qatar | 297.75655 | 7.3184753 | 1.9904020 |
| Romania | 249.18300 | 1.7500807 | 0.5596619 |
| Russia | 269.54960 | 3.4704484 | 1.2442838 |
| Rwanda | 329.05605 | 0.0239462 | -3.7319461 |
| Samoa | 218.51166 | 0.2134529 | -1.5443390 |
| Sao Tome and Principe | 254.96816 | 0.1302287 | -2.0384631 |
| Saudi Arabia | 307.46054 | 3.9673274 | 1.3780927 |
| Senegal | 395.24439 | 0.1673049 | -1.7879372 |
| Serbia | 217.43511 | 1.5843004 | 0.4601430 |
| Seychelles | 181.79686 | 0.9933677 | -0.0066544 |
| Sierra Leone | 429.65291 | 0.0647040 | -2.7379317 |
| Singapore | 263.00704 | 6.4327489 | 1.8614020 |
| Slovak Republic | 225.74915 | 4.5391211 | 1.5127334 |
| Slovenia | 222.98291 | 2.5129238 | 0.9214469 |
| Solomon Islands | 357.53587 | 0.1097085 | -2.2099282 |
| Somalia | 362.48879 | 0.0270852 | -3.6087678 |
| South Africa | 293.02466 | 2.5586906 | 0.9394956 |
| South Korea | 334.21139 | 2.1194753 | 0.7511686 |
| South Sudan | 398.13184 | 0.0370448 | -3.2956261 |
| Spain | 231.76623 | 1.9819283 | 0.6840702 |
| Sri Lanka | 272.26969 | 0.2886413 | -1.2425707 |
| St. Kitts and Nevis | 59.94247 | 0.8058386 | -0.2158718 |
| St. Lucia | 220.13901 | 0.4149238 | -0.8796605 |
| St. Vincent and the Grenadines | 230.02691 | 0.3193408 | -1.1414964 |
| Sudan | 319.03094 | 0.1166996 | -2.1481526 |
| Suriname | 273.07309 | 1.4609686 | 0.3790996 |
| Sweden | 128.63444 | 3.7264305 | 1.3154508 |
| Switzerland | 166.26103 | 4.1204439 | 1.4159609 |
| Syria | 303.85067 | 0.6301390 | -0.4618148 |
| Taiwan | 296.09933 | 2.4094574 | 0.8794016 |
| Tajikistan | 255.55426 | 0.5052063 | -0.6827885 |
| Tanzania | 325.12556 | 0.0730628 | -2.6164362 |
| Thailand | 302.30291 | 0.5908296 | -0.5262276 |
| Timor-Leste | 363.87668 | 0.0569910 | -2.8648614 |
| Togo | 336.44709 | 0.0909552 | -2.3973887 |
| Tonga | 190.89417 | 0.2493049 | -1.3890785 |
| Trinidad and Tobago | 223.89283 | 4.8734843 | 1.5838091 |
| Tunisia | 338.26951 | 0.4442825 | -0.8112946 |
| Turkey | 290.41529 | 0.9963004 | -0.0037064 |
| Turkmenistan | 330.32287 | 4.0312960 | 1.3940879 |
| Tuvalu | 68.19726 | 0.2638565 | -1.3323499 |
| UAE | 309.10951 | 5.5005561 | 1.7048492 |
| UK | 153.93148 | 8.0481614 | 2.0854437 |
| USA | 177.84022 | 10.0326951 | 2.3058493 |
| Uganda | 386.88161 | 0.0422108 | -3.1650801 |
| Ukraine | 239.93363 | 2.4806592 | 0.9085243 |
| Uruguay | 242.11081 | 0.7128924 | -0.3384248 |
| Uzbekistan | 269.65426 | 1.9439327 | 0.6647131 |
| Vanuatu | 218.91480 | 0.2458475 | -1.4030437 |
| Venezuela | 269.91480 | 1.9177803 | 0.6511684 |
| Vietnam | 287.26144 | 0.2582825 | -1.3537013 |
| Yemen | 422.88161 | 0.2103677 | -1.5588983 |
| Zambia | 319.54350 | 0.5254260 | -0.6435459 |
| Zimbabwe | 292.04709 | 0.6607399 | -0.4143950 |
final_data <- final_data %>%
mutate(Decade = cut(year, breaks = seq(1953, 2023, 10),
labels = c("1953-1962", "1963-1972",
"1973-1982", "1983-1992",
"1993-2002", "2003-2012",
"2013-2022"),
include.lowest = TRUE))
final_data_means <- final_data %>%
filter(!is.na(Decade)) %>%
group_by(country, Decade) %>%
summarize(
mean_emissions = mean(emissions, na.rm = TRUE),
mean_child_mortality = mean(child_mortality, na.rm = TRUE),
.groups = 'drop'
)
ggplot(final_data_means, aes(x = mean_emissions,
y = mean_child_mortality)) +
geom_point() +
facet_wrap(Decade ~ ., scales = "free") +
labs(title = "Mean Child Mortality vs. Mean Emissions Over Decades (1953 - 2022)",
x = "Mean Emissions Per Capita (Log Transformed)",
y = "") +
scale_x_log10() +
theme_bw()# Snapshot Scatterplot
means_data %>%
ggplot(aes(x = log_mean_emissions, y = mean_child_mortality)) +
geom_point() +
theme_bw() +
labs(x = "Log of Mean Emissions", y = "",
title = "Scatter Plot of Mean Child Mortality vs Log of Mean Emissions by Country") +
geom_smooth(method = "lm", color = "blue")# New linear model
means_data_lm <- lm(mean_child_mortality ~ log_mean_emissions,
data = means_data)
tidy(means_data_lm)# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 260. 4.93 52.7 4.32e-116
2 log_mean_emissions -31.3 2.98 -10.5 1.07e- 20
# New scatterplot and histogram for residuals
means_data_lm %>%
augment() %>%
ggplot(aes(x = .fitted, y = .resid)) +
geom_point()means_data_lm %>%
augment() %>%
ggplot(aes(x = .resid)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "white") + # Removed binwidth
geom_density(alpha = .2, fill = "#FF6666") +
labs(x = "Residuals", y = "Density",
title = "Histogram of Residuals with Normal Density Curve") +
theme_bw()# New variances table
means_data_lm %>%
augment() %>%
summarise(
response_variance = var(mean_child_mortality, na.rm = TRUE),
fitted_variance = var(.fitted, na.rm = TRUE),
residuals_variance = var(.resid, na.rm = TRUE)
) %>%
pivot_longer(
everything(),
names_to = "Metric",
values_to = "Variance"
) %>%
mutate(
Metric = case_when(
Metric == "response_variance" ~ "Variance of Response",
Metric == "fitted_variance" ~ "Variance of Fitted Values",
Metric == "residuals_variance" ~ "Variance of Residuals",
TRUE ~ Metric # Fallback, should not be needed
)
)# A tibble: 3 × 2
Metric Variance
<chr> <dbl>
1 Variance of Response 6730.
2 Variance of Fitted Values 2458.
3 Variance of Residuals 4272.
predictions_means <- predict(means_data_lm)
# Generate simulated residuals
simulated_residuals_means <- rnorm(length(predictions_means), mean = 0, sd = sigma(means_data_lm))
# Create a dataframe with predicted values and simulated residuals
simulated_data_means <- data.frame(predicted = predictions_means, residuals = simulated_residuals_means)
# Plot simulated residuals vs predicted values to assess variance
ggplot(simulated_data_means, aes(x = predicted, y = residuals)) +
geom_point() +
theme_bw() +
labs(x = "Predicted Values", y = "Simulated Response Values",
title = "Simulated Response Values vs. Predicted Values")noise <- function(x, mean = 0, sd) {
return(x + rnorm(length(x), mean, sd))
}
# Generate 1000 simulations
st_error_means <- sigma(means_data_lm)
# Simulate the response variable
simulated_response_means <- map_dfc(.x = 1:1000, .f = ~ {
tibble(
simulated_child_mortality = noise(
x = predict(means_data_lm, means_data), # Use the original model predictions as a base
sd = st_error_means # Add noise based on the model's standard error
)
)
})
# Now, bind the simulated response with the original predictor
binded_simulations_means <- means_data %>%
select(country, log_mean_emissions) %>%
bind_cols(simulated_response_means)
# Iterate over each simulation to calculate R-squared values
new_col_names <- c("country", "log_mean_emissions", paste("simulated_child_mortality", 1:1000, sep = "_"))
names(binded_simulations_means) <- new_col_names
# Now we can run the map_dbl to calculate R-squared values
simulated_r_sq_means <- map_dbl(1:1000, ~ {
# Extract the simulated response for this iteration
col_name <- paste("simulated_child_mortality", .x, sep = "_")
# Fit the linear model using the simulated response
model <- lm(reformulate("log_mean_emissions", response = col_name), data = binded_simulations_means)
# Extract R-squared value
summary(model)$r.squared
})
# Plotting the histogram of R-squared values
tibble(simulations = simulated_r_sq_means) %>%
ggplot(aes(x = simulations)) +
geom_histogram(binwidth = 0.001, color = "black", fill = "skyblue") +
labs(x = expression("Simulated"~ R^2),
y = "Frequency",
subtitle = "Histogram of Simulated R-squared Values") +
theme_bw() +
scale_y_continuous()